Generated at 2026-02-22T02:22:35.010703+00:00
Workload types differ (inference vs training). Interpret metric winners cautiously for cross-workload comparisons. Overall winner: MiniCluster. MiniCluster won 8 of 9 comparable metrics. Largest deltas: performance_per_watt (+755.27%), throughput_samples_per_sec (+518.19%), latency_p50_ms (-100.00%). No regressions exceeded 5.0%. Consistency findings: none.
| Metric | AutoPerfPy | MiniCluster | Abs Delta | % Delta | Winner |
|---|---|---|---|---|---|
| communication_overhead_percent | N/A | N/A | N/A | N/A | N/A |
| decode_tpt_ms | N/A | N/A | N/A | N/A | N/A |
| energy_per_step_joules | 7.3482 | 4.1100 | 3.2383 | -44.07% | MiniCluster |
| latency_p50_ms | 42.6238 | 0.0000 | 42.6238 | -100.00% | MiniCluster |
| latency_p95_ms | 54.6869 | 0.0000 | 54.6869 | -100.00% | MiniCluster |
| latency_p99_ms | 55.7591 | 0.0000 | 55.7591 | -100.00% | MiniCluster |
| memory_utilization_percent | 96.5667 | 0.0000 | 96.5667 | -100.00% | MiniCluster |
| performance_per_watt | 0.4555 | 3.8956 | 3.4401 | +755.27% | MiniCluster |
| power_consumption_watts | 29.0711 | 27.1027 | 1.9684 | -6.77% | MiniCluster |
| scaling_efficiency_pct | N/A | N/A | N/A | N/A | N/A |
| temperature_celsius | 38.1400 | 37.4586 | 0.6814 | -1.79% | AutoPerfPy |
| throughput_samples_per_sec | 17.0790 | 105.5810 | 88.5020 | +518.19% | MiniCluster |
| tokens_per_sec | N/A | N/A | N/A | N/A | N/A |
| ttft_ms | N/A | N/A | N/A | N/A | N/A |
Consolidated graph views for quick comparison validation.
Top Normalized Deltas
Positive favors MiniCluster; negative favors AutoPerfPy.
Metric Family Deltas
Family-level signed summary of normalized deltas.
Winner Distribution
Metric-level outcome share across comparable metrics.
Confidence Distribution
Data-strength breakdown for metric conclusions.
Positive values indicate an advantage for MiniCluster; negative values favor AutoPerfPy.
| Metric | Family | Direction | Raw Delta % | Normalized Delta % | Visual | Advantage |
|---|---|---|---|---|---|---|
| performance_per_watt | performance | high | +755.27% | +755.27% | MiniCluster | |
| throughput_samples_per_sec | performance | high | +518.19% | +518.19% | MiniCluster | |
| latency_p50_ms | latency | low | -100.00% | +100.00% | MiniCluster | |
| latency_p95_ms | latency | low | -100.00% | +100.00% | MiniCluster | |
| latency_p99_ms | latency | low | -100.00% | +100.00% | MiniCluster | |
| energy_per_step_joules | efficiency | low | -44.07% | +44.07% | MiniCluster | |
| power_consumption_watts | efficiency | low | -6.77% | +6.77% | MiniCluster | |
| temperature_celsius | other | high | -1.79% | -1.79% | AutoPerfPy | |
| memory_utilization_percent | memory | context | -100.00% | +0.00% | context |
Family-level mean of normalized metric deltas (context-only metrics excluded).
| Family | Metrics | Normalized Delta % | Visual | Winner |
|---|---|---|---|---|
| performance | 2 | +636.73% | MiniCluster | |
| latency | 3 | +100.00% | MiniCluster | |
| efficiency | 2 | +25.42% | MiniCluster | |
| other | 1 | -1.79% | AutoPerfPy |
Confidence is based on metric availability in both results.
| Metric | Family | AutoPerfPy Available | MiniCluster Available | Direction | Confidence |
|---|---|---|---|---|---|
| communication_overhead_percent | communication | no | no | low | none |
| decode_tpt_ms | other | no | no | low | none |
| energy_per_step_joules | efficiency | yes | yes | low | strong |
| latency_p50_ms | latency | yes | yes | low | strong |
| latency_p95_ms | latency | yes | yes | low | strong |
| latency_p99_ms | latency | yes | yes | low | strong |
| memory_utilization_percent | memory | yes | yes | context | strong |
| performance_per_watt | performance | yes | yes | high | strong |
| power_consumption_watts | efficiency | yes | yes | low | strong |
| scaling_efficiency_pct | other | no | no | high | none |
| temperature_celsius | other | yes | yes | high | strong |
| throughput_samples_per_sec | performance | yes | yes | high | strong |
| tokens_per_sec | other | no | no | high | none |
| ttft_ms | other | no | no | low | none |
No consistency regressions detected or all-reduce step data was unavailable.